Applying WebTables in Practice
نویسندگان
چکیده
We started investigating the collection of HTML tables on the Web and developed the WebTables system a few years ago [4]. Since then, our work has been motivated by applying WebTables in a broad set of applications at Google, resulting in several product launches. In this paper, we describe the challenges faced, lessons learned, and new insights that we gained from our efforts. The main challenges we faced in our efforts were (1) identifying tables that are likely to contain high-quality data (as opposed to tables used for navigation, layout, or formatting), and (2) recovering the semantics of these tables or signals that hint at their semantics. The result is a semantically enriched table corpus that we used to develop several services. First, we created a search engine for structured data whose index includes over a hundred million HTML tables. Second, we enabled users of Google Docs (through its Research Panel) to find relevant data tables and to insert such data into their documents as needed. Most recently, we brought WebTables to a much broader audience by using the table corpus to provide richer tabular snippets for fact-seeking web search queries on Google.com.
منابع مشابه
Surveillance and monitoring: a vital investment for the changing burdens of disease.
ed from webtables. EDITORIAL 1141
متن کاملEvidence based medicine in nuclear medicine practice; Part II: Appraising and applying the evidence
As described in the first part of this article, Evidence Based Medicine (EBM) is a growing part of medical practice which emphasizes on the best evidence. Finding this evidence by formulating an answerable question and searching strategies were described in the first part of this review. In this part, appraising the retrieved article (with the main focus on the diag...
متن کاملQUICK: Expressive and Flexible Search over Knowledge Bases and Text Collections
Recent work on Web-extracted data sets has produced an interesting new source of structured Web data. These data sets can be viewed as knowledge bases (KB) – large heterogeneous linked entity collections with millions of unique edge and node labels, often encoding rich semantic information over entities. For example, YAGO [5] and ExDB [2] have fact collections numbering in the tens and hundreds...
متن کاملSchema Extraction for Tabular Data on the Web
Tabular data is an abundant source of information on the Web, but remains mostly isolated from the latter’s interconnections since tables lack links and computer-accessible descriptions of their structure. In other words, the schemas of these tables — attribute names, values, data types, etc. — are not explicitly stored as table metadata. Consequently, the structure that these tables contain is...
متن کاملUncovering the Relational Web
The World-Wide Web consists of a huge number of unstructured hypertext documents, but it also contains structured data in the form of HTML tables. Many of these tables contain both relational-style data and a small “schema” of labeled and typed columns, making each such table a small structured database. The WebTables project is an effort to extract and make use of the huge number of these stru...
متن کامل